Drawing Sound Conclusions from Noisy Judgments

نویسندگان

David Goldberg

Andrew Trotman

Xiao Wang

Wei Min

Zongru Wan

چکیده

The quality of a search engine is typically evaluated using hand-labeled data sets, where the labels indicate the relevance of documents to queries. Often the number of labels needed is too large to be created by the best annotators, and so less accurate labels (e.g. from crowdsourcing) must be used. This introduces errors in the labels, and thus errors in standard precision metrics (such as P@k and DCG); the lower the quality of the judge, the more errorful the labels, consequently the more inaccurate the metric. We introduce equations and algorithms that can adjust the metrics to the values they would have had if there were no annotation errors. This is especially important when two search engines are compared by comparing their metrics. We give examples where one engine appeared to be statistically significantly better than the other, but the effect disappeared after the metrics were corrected for annotation error. In other words the evidence supporting a statistical difference was illusory, and caused by a failure to account for annotation error. CCS Concepts •Information systems→ Presentation of retrieval results;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing Bayesian and heuristic predictions of mass judgments of colliding objects

Mass judgments of colliding objects have been used to explore people's understanding of the physical world because they are ecologically relevant, yet people display biases that are most easily explained by a small set of heuristics. Recent work has challenged the heuristic explanation, by producing the same biases from a model that copes with perceptual uncertainty by using Bayesian inference ...

متن کامل

بکارگیری سیستم‌های اطلاعات جغرافیایی (GIS) در ارزیابی آلودگی صوتی محیط‌های کار: مطالعه موردی کارخانه نساجی

Background and Objective: Noise pollution causes many physiological, psychological, economic and social effects on human life. This issue is more important in the environment of industrial workplaces. This research aimed to adopt the functions of GIS for evaluating and spatial analysis of noises in industrial environments. Materials and Methods: At the initial step, the spatial data for indust...

متن کامل

Spatial Congruity in Audiovisual Synchrony Judgments

Rainer Guski Dept. of Psychology, Ruhr-University Bochum, Germany [email protected] Abstract The systematic analysis of the perception of audiovisual synchrony has shown that auditory delays are tolerated to a certain extent in synchrony judgments, but the variation of spatial separation between light and sound has brought conflicting results: While Lewald & Guski (2003) did not find a...

متن کامل

Debunking the Myth of Value-Neutral Virginity: Toward Truth in Scientific Advertising

The scientific community often portrays science as a value-neutral enterprise that crisply demarcates facts from personal value judgments. We argue that this depiction is unrealistic and important to correct because science serves an important knowledge generation function in all modern societies. Policymakers often turn to scientists for sound advice, and it is important for the wellbeing of s...

متن کامل

Noisy Newtons: Unifying process and dependency accounts of causal attribution

There is a long tradition in both philosophy and psychology to separate process accounts from dependency accounts of causation. In this paper, we motivate a unifying account that explains people’s causal attributions in terms of counterfactuals defined over probabilistic generative models. In our experiments, participants see two billiard balls colliding and indicate to what extent ball A cause...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Drawing Sound Conclusions from Noisy Judgments

نویسندگان

چکیده

منابع مشابه

Testing Bayesian and heuristic predictions of mass judgments of colliding objects

بکارگیری سیستم‌های اطلاعات جغرافیایی (GIS) در ارزیابی آلودگی صوتی محیط‌های کار: مطالعه موردی کارخانه نساجی

Spatial Congruity in Audiovisual Synchrony Judgments

Debunking the Myth of Value-Neutral Virginity: Toward Truth in Scientific Advertising

Noisy Newtons: Unifying process and dependency accounts of causal attribution

عنوان ژورنال:

اشتراک گذاری